Skip to content

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Jul 24, 2025

Description

This implements the SwiGLU activation for MoE and qMoE. The activation is corresponding to https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

Motivation and Context

This is naive implementation of the activation. Since the activation will reduce each row length to half, we cannot directly use epilogue. Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not need extra buffer.

@tianleiwu tianleiwu changed the title [CUDA] Support SwiGlu in MoE [CUDA] Support SwiGlu in MoE and qMoE Jul 28, 2025
@tianleiwu tianleiwu merged commit a2b4546 into main Jul 28, 2025
92 checks passed
@tianleiwu tianleiwu deleted the tlwu/moe_swiglu branch July 28, 2025 22:27
sophies927 pushed a commit that referenced this pull request Aug 2, 2025
### Description

This implements the SwiGLU activation for MoE and qMoE. The activation
is corresponding to
https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

### Motivation and Context

This is naive implementation of the activation. Since the activation
will reduce each row length to half, we cannot directly use epilogue.
Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not
need extra buffer.
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description

This implements the SwiGLU activation for MoE and qMoE. The activation
is corresponding to
https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

### Motivation and Context

This is naive implementation of the activation. Since the activation
will reduce each row length to half, we cannot directly use epilogue.
Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not
need extra buffer.
snnn pushed a commit that referenced this pull request Aug 30, 2025
### Description

This implements the SwiGLU activation for MoE and qMoE. The activation
is corresponding to
https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

### Motivation and Context

This is naive implementation of the activation. Since the activation
will reduce each row length to half, we cannot directly use epilogue.
Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not
need extra buffer.
tianleiwu added a commit that referenced this pull request Sep 4, 2025
### Description

This implements the SwiGLU activation for MoE and qMoE. The activation
is corresponding to
https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

### Motivation and Context

This is naive implementation of the activation. Since the activation
will reduce each row length to half, we cannot directly use epilogue.
Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not
need extra buffer.
@tianleiwu tianleiwu added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.0 labels Sep 4, 2025
jywu-msft pushed a commit that referenced this pull request Sep 5, 2025
### Description
Cherry-pick the following PRs:
#25943
#25937 
#25917
#25909
#25898
#25897
#25888
#25881
#25830
#25619
#25575
#25572
#25558
#25530
#25474
#25455
#25110

Also two dependent PRs for qMoE cpu: 
#25877
#25822

---------

Co-authored-by: xiaomsft <[email protected]>
Co-authored-by: Xiaoyan Hu <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: Kunal Vaishnavi <[email protected]>
Co-authored-by: Pradeep Sakhamoori <[email protected]>
Co-authored-by: mingyue <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Emmanuel <[email protected]>
Co-authored-by: Emmanuel Assumang <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: praneshgo <[email protected]>
Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: Jing Fang <[email protected]>
Co-authored-by: Ishwar Raut <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants